Phoneme and sub-phoneme t-normalization for text-dependent speaker recognition

نویسندگان

  • Doroteo Torre Toledano
  • Cristina Esteve-Elizalde
  • Joaquín González-Rodríguez
  • Rubén Fernández Pozo
  • Luis A. Hernández Gómez
چکیده

1 Test normalization (T-Norm) is a score normalization technique that is regularly and successfully applied in the context of text-independent speaker recognition. It is less frequently applied, however, to text-dependent or textprompted speaker recognition, mainly because its improvement in this context is more modest. In this paper we present a novel way to improve the performance of T-Norm for text-dependent systems. It consists in applying score TNormalization at the phoneme or sub-phoneme level instead of at the sentence level. Experiments on the YOHO corpus show that, while using standard sentence-level T-Norm does not improve equal error rate (EER), phoneme and sub-phoneme level T-Norm produce a relative EER reduction of 18.9% and 20.1% respectively on a state-of-the-art HMM based textdependent speaker recognition system. Results are even better for working points with low false acceptance rates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech

Speaker segmentation is widely used in many tasks such as multi-speaker detection and speaker tracking. The segmentation performance depends on the performance of speaker verification (SV) between two short utterances to a large extent, so the improvement of the SV performance for short utterances would give the segmentation performance a great help. In this paper, a method based on phoneme rec...

متن کامل

Phoneme recognition using acoustic events

This paper presents a new approach to phoneme recognition using nonsequential sub{phoneme units. These units are called acoustic events and are phonologically meaningful as well as recognizable from speech signals. Acoustic events form a phonologically incomplete representation as compared to distinctive features. This problem may partly be overcome by incorporating phonological constraints. Cu...

متن کامل

Impact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices

Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these...

متن کامل

Multilingual text-to-phoneme mapping

This paper introduces a novel approach for generating multilingual text-to-phoneme mappings for use in multilingual speech recognition systems. The multilingual mappings are based on the weighted outputs from a neural network text-to-phoneme model, trained on data mixed from several languages. The multilingual mappings used together with a branched grammar decoding scheme is able to capture bot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008